Distance Measures for Effective Clustering of ARIMA Time-Series

نویسندگان

  • Konstantinos Kalpakis
  • Dhiral Gada
  • Vasundhara Puttagunta
چکیده

Many environmental and socioeconomic time–series data can be adequately modeled using Auto-Regressive Integrated Moving Average (ARIMA) models. We call such time–series ARIMA time–series. We consider the problem of clustering ARIMA time–series. We propose the use of the Linear Predictive Coding (LPC) cepstrum of time–series for clustering ARIMA time–series, by using the Euclidean distance between the LPC cepstra of two time–series as their dissimilarity measure. We demonstrate that LPC cepstral coefficients have the desired features for accurate clustering and efficient indexing of ARIMA time–series. For example, few LPC cepstral coefficients are sufficient in order to discriminate between time–series that are modeled by different ARIMA models. In fact this approach requires fewer coefficients than traditional approaches, such as DFT and DWT. The proposed distance measure can be used for measuring the similarity between different ARIMA models as well. We cluster ARIMA time–series using the Partition Around Medoids method with various similarity measures. We present experimental results demonstrating that using the proposed measure we achieve significantly better clusterings of ARIMA time–series data as compared to clusterings obtained by using other traditional similarity measures, such as DFT, DWT, PCA, etc. Experiments were performed both on simulated as well as real data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...

متن کامل

A Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach

In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...

متن کامل

Fuzzy clustering of time series data: A particle swarm optimization approach

With rapid development in information gathering technologies and access to large amounts of data, we always require methods for data analyzing and extracting useful information from large raw dataset and data mining is an important method for solving this problem. Clustering analysis as the most commonly used function of data mining, has attracted many researchers in computer science. Because o...

متن کامل

Time series forecasting of Bitcoin price based on ARIMA and machine learning approaches

Bitcoin as the current leader in cryptocurrencies is a new asset class receiving significant attention in the financial and investment community and presents an interesting time series prediction problem. In this paper, some forecasting models based on classical like ARIMA and machine learning approaches including Kriging, Artificial Neural Network (ANN), Bayesian method, Support Vector Machine...

متن کامل

Which Methodology is Better for Combining Linear and Nonlinear Models for Time Series Forecasting?

Both theoretical and empirical findings have suggested that combining different models can be an effective way to improve the predictive performance of each individual model. It is especially occurred when the models in the ensemble are quite different. Hybrid techniques that decompose a time series into its linear and nonlinear components are one of the most important kinds of the hybrid model...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001